TR  90019 


UNLIMITED 


AD-A227  864 


ROYAL  AEROSPACE  ESTABLISHMENT 


DTIC 

ELECTE 
OCT  2  5 1990 

D 


Technical  Report  TR  90019 

March  1990 


A  Subjective  Rating  Scale  for  Assessing 

Pilot  Workload  in  Flight: 

A  Decade  of  Practical  Use 


by 


A.  H.  Roscoe 
G.  A.  Ellis 


- - - 


m»  »* 


Procurement  Executive,  Ministry  of  Defence 
Farnborough,  Hampshire 


UNLIMITED 

90  10  -22  164 


UNLIMITED 


ROYAL  AEROSPACE  ESTABLISHMENT 

Technical  Report  90019 
Received  for  printing  5  March  1990 


A  SUBJECTIVE  RATING  SCALE  FOR  ASSESSING  PILOT  WORKLOAD  IN  FLIGtT: 

A  DECADE  OF  PRACTICAL  USE 

by 

A.  ri.  Roscoe* 

G.  A.  Elli3** 


SUMMARY 


-'-Despite  the  many  techniques  developed  for  evaluating  pilot  workload  in 
C)  ighc  ,  subjective  assessment  by  experience^  pilots  is  still  the  most  reliable 
method  by  far.  This  report  describes  the  design  and  devoloproart  *  with  the  nelp 
of  practising  test  pilots  -  of  a  ten-po.nt  rating  .*ea,'o.  The  scale  uses  a 
decision  tree  similar  to  that  used  by  the  Cooper-Harper  Handling  Qualities  scale, 
anti  is  based  on  tne  concept  of  spare  capacity.  Examples  are  given  of  its  use  by 
a  largo  number  of  pilots  in  various  flight  trials  and  yorkload  studies.. 

I  .  f  •  •  t  '  '  '  V  '  ,  »  !\/.» >  ’  (  ■  ,  ' 


t 

-  *  *».*  •, 
)  / 


/..lit;  *  -J  • 


,  .  •-  t  /  M 


r* 


Departmental  Reference:  EM  6 


Copyright 

S 

Cent  roller  KNSO  tend on 
1990 


*  Formerly  or  rak,  Bedford,  and  now  with  1‘fUannla  Airways 
**  Formerly  of  RAE,  Bedford,  and  new  with  British  Aerospace 


UNLIMITED 


■l 

.1 


2 


Page 

3 

4 

4 

6 

7 

11 

13 

14 

Figure  1  (in  text) 
inside  bask  cover 


LIST  OF  CONTENTS 

1  INTRODUCTION 

2  THE  'BEDFORD'  WORKLOAD  SCALE 

2.1  Design  and  development  of  the  'Bedford'  scale 

2.2  Description  of  the  scale  (Fig  1) 

3  PRACTICAL  EXPERIENCE  WITH  THE  'BEDFORD'  SCALE 

f.  DISCUSSION 

5  CONCLUSIONS 

Ref  ere  roes 
Illustration 

Report  documentation  page 


3 


1  INTRODUCTION 

In  1969  the  increasing  importance  to  flight  safety  of  changes  in  levels  of 
pilot  workload  generated  by  new  operating  techniques  such  as  vertical  take-off 
and  landing  (VTOL) ,  low  visibility  landings,  and  reduced  noise  approaches,  then 
being  evaluated  at  the  Royal  Aircraft  Establishment  at  Bedford,  resulted  in  a 
greater  interest  in  assessing  workload  during  flight  testing.  Unstructured  pilot 
opinion  recorded  during  flight,  or,  more  often,  after  flight  was  the  accepted 
method  of  evaluating  workload  at  this  time.  The  possibility  of  obtaining 
misleading  information  because  of  bias  or  of  pre-conceived  notions  about  workload 
levels  -  a  recognised  problem  associated  with  subjective  techniques  -  resulted  in 
a  programme  aimed  at  developing  a  complementary  but  more  independent  measure. 
Following  a  detailed  survey  of  available  techniques,  monitoring  of  pilot's  heart 
rates  was  selected  for  further  evaluation. 


After  some  5  years,  during  which  considerable  experience  was  gained  of 
using  test  pilot's  heart  rates  to  support  their  opinions  of  workload,  it  was 
decided  to  improve  the  method  of  obtaining  subjective  assessments  by  employing  a 
specially  designed  rating  scale.  A  search  for  a  suitable  scale  for  use  in  flight 
was  unsuccessful;  mo3t  research  on  subjective  assessment  by  pilots  had  been 
concerned  with  aircraft  handiing  qualities’”3.  Although  3ome  scales  such  as  that 
designed  by  Cooper  and  Harper3  havo  sometimes  been  used  to  rata  workload  they  are 
not  ideai  for  this  purpose.  As  Gerathewhol3  pointed  out:  "...  subjective  pilot 
ratings  of  handling  qualities,  as  accurate  as  they  may  be  in  regard  to  control 
desirability  or  difficulty,  do  net  contribute  to  workload  determinations,  since 
they  are  only  loosely  connected  to  task  demands  and  pilot  response".  The 
decision  was  made,  therefore,  to  design  and  develop  a  workload  rating  aeaie  at 
RAE  Bedford  'on  the  back*  of  current  flight  testing  and  with  the  help  of 
practising  test  pilots. 

As  well  as  designing  a  pilot  workload  rating  scale  it  seemed  sensible  to 
define  workload.  A  review  of  the  literature  revealed  a  plethora  of  definitions 
based  mostly  on  workload  as  a  ret  of  flight  task  demands,  as  the  effort  required 
to  satisfy  these  demands,  or  as  the  results  of  that  effort  -  performance.  Many 
of  the  definitions  appeared  complicated  and/or  unrealistic  in  the  contest  of  real 
flight. 


Using  a  questionnaire,  Sills  and  Rescue*  obtained  the  views  of  some  31*0 
military  and  airline  pilots  and  concluded  that  more  than  SOI  of  professional 
pilots  think  of  workload  in  terms  of  effort.  This  Is  also  an  interpretation  that 
agrees  well  with  the  influence  of  such  individual  factors  as  natural  ability. 


4 


training,  and  experience  on  the  piloting  task.  There  is  evidence  that  the 
failure  of  pilots  to  perceive  the  demands  of  the  flight  task  correctly  has  been  a 
causative  factor  in  several  accidents;  it  also  seems  likely  that  workload  levels 
tend  to  be  determined  by  how  a  pilot  assesses  the  flight  task.  With  these 
findings  in  mind  Ellis  and  Roscoe  proposed  that  a  slight  modification  to  the 
definition  of  workload  used  by  Cooper  and  Harper  in  the  introduction  to  their 
Handling  Qualities  Rating  Scale3  would  be  appropriate,  namely;  "Pilot  workload  is 
the  integrated  mental  and  physical  effort  required  to  satisfy  the  perceived 
demands  of  a  specified  flight  task”. 

2  THE  ' BEDFORD 1  WORKLOAD  SCALE 

2.1  Design  and  development  of  the  ’Bedford*  scale 

The  initial  objective  was  to  design  an  interval  scale  that  could  be  used  to 
give  ratings  in  flight  during  -  or  immediately  following  -  highly  demanding 
piloting  tasks,  and  that  would  result  in  absolute  values  of  workload. 

As  Ellis6  observed;  “The  use  of  rating  scales  results  in  the  allocation  of 
a  numerical  value  to  the  quantity  that  is  being  measured.  Not  unnaturally, 
researchers  wish  to  use  statistical  and  mathematical  processes  on  the  numbers  s.- 
obtained,  and  so  most  of  the  rating  scales  that  have  bean  devised  have  bean 
intended  to  be  linear".  However,  McDonnell1,  in  discussing  the  rating  of 
aircraft  handling  qualities,  referred  to  the  difficulty  of  achieving  linearity 
with  ordinal  and  adjectival  scales.  And  Hess*  later  commented;  “The  majority  ■: 
rating  scales  in  existence  have  two  things  in  common;  they  are  both  ordinal  and 
adjectival  in  nature".  Furthermore,  the  results  at  various  other  laboratory 
studies^'13  aimed  at  developing  nen-adjeetival  linear  rating  scales  were  not 
encouraging. 

It  became  obvious  that  an  ordinal,  adjsetival  rating  scale  would  be  most 
appropriate  for  development  at  a  flight  test  centre  lacking  facilities  tor 
laboratory  experiments.  An  ordinal  scale  for  aircraft  handling  qualities,  the 
Cooper-Harper  scale3,  was  already  developed  and  established;  it  was  easy  t»  use 
and  was  widely  accepted  amongst  test  pilots  and  engineers,  St  therefore  made 
good  sense  to  try  to  design  the  workload  scale  using  a  similar  design  of  desist**, 
tree  with  appropriate  descriptors. 

The  first  design,  a  nine-point  scale,  used  descriptors  based  an  'effort* 
such  as:  Pilot  effort  not  a  factor  for  desired  performance  -  rating  1;  desired 
performance  requires  moderate  pilot  effort  -  rating  3:  adequate  performance 


|  requires  extensive  pilot  effort  -  rating  it  intensive  pilot  effort  is  requited  to 

I 


retain  control  -  rating  8;  and  finally,  control  will  be  l03t  during  some  portions 
of  required  operation  -  rating  9. 


At  first  it  was  not  obvious  whether  the  scale  should  be  absolute  or 
comparative.  In  other  words,  should  it  try  to  cover  all  possible  workload  levels 
in  all  flying  tasks?  Or  should  it  have  the  more  limited  aim  of  acting  as  a 
comparative  measure  between  the  workload  experienced  and  that  which  could  be 
considered  normal  or  reasonable  for  the  task  in  hand?  It  was  therefore  decided  to 
construct  both  types  of  scale,  and  then  to  decide  which  of  them  would  be  the  more 
appropriate  in  various  circumstances. 

An  interesting  finding  from  the  questionnaire  study  by  Ellis  and  Roscoe5 
was  that  pilots  find  it  convenient  to  think  in  terms  of  'spare  capacity'  when 
considering  their  levels  of  workload.  What  other  relevant  but  secondary  tasks 
can  be  taken  on  in  addition  to  the  primary  flight  task?  For  example,  when  the 
primary  task  is  an  instrument  approach  in  bad  weather  how  much  spare  capacity 
doe3  one  consider  is  available  for  monitoring  the  actions  of  the  other  pilot, 
looking  outside  the  cockpit,  listening  to  the  radio  etc?  The  higher  the  workload 
generated  by  the  primary  task  the  less  capacity  there  is  for  these  secondary 
tasks.  Pilots  seem  to  find  it  a  relatively  simple  matter  to  judge  how  much  more 
they  could  do  even  if  there  is  no  requirement  to  do  so. 

These  findings  suggested  that  descriptors  incorporating  the  concept  of 
spore  capacity  would  be  of  greater  value  than  reference  to  effort.  The  scales 
were  also  extended  from  nine  to  ten  ratings. 

In  addition  to  the  eoncept  of  spare  capacity  the  'absolute'  scale  also 
refereed  to  arousal,  time,  and  fatigue.  Those  new  descriptors  Included: 

haw  workload,  plenty  of  time  and  capacity  fee  complete  all  tasks  at  a 
moderate  arousal  state,  level  of  effort  could  be  maintained  for  several  hours.1  - 
rating  2, 

Node rate  workload,  all  primary  and  secondary  tasks  within  pilot  capacity, 
but  fairly  high  arousal  state  needed;  tiring  and  fatigue  likely  after 
1-2  hours;  -  rating  3.  Very  high  workload,  only  mare  irvertant  secondary  tasks 
completed  and  then  only  infrequently;  -  rating  6, 

In  view  of  the  present  concern  about  undetarousal  and  underload  it  is  wetth 
noting  that  the  lowest  workload  descriptor  in  this  scale  read:  Very  low  workload, 
few  tasks  for  the  time  available,  some  risk  of  boredem;  -  rating  1.  This  was 
shortly  amended  to:  Workload  toe  low,  too  much  spare  capacity,  danger  of 
complacency*. 


6 


During  construction  of  the  scales  it  quickly  became  clear  that  any 
•absolute'  scale  that  attempted  to  include  the  whole  spectrum  of  workload  levels 
experienced  by  pilots  would  be  too  coarse  to  be  practical.  Certain  tasks  such  as 
gun  aiming  or  landing  in  adverse  weather  were  always  concentrated  within  a  few 
seconds  and  were  always  high  workload.  Others,  such  as  en-route  flying,  could  be 
sustained  for  many  hours.  To  place  them  on  the  same  scale  was  of  very  limited 
value.  Also,  pilots  comments  and  ratings  of  workload  made  during  various  flight 
tests  in  different  types  of  aircraft,  showed  the  difficulty  of  obtaining  absolute 
values.  It  was  found  that  pilots  liked  to  compare  their  workload  to  some  form  of 
baseline,  usually  to  previous  experience. 

Effort  was  therefore  concentrated  upon  developing  a  comparative  scale  that 
would  help  to  answer  the  important  and  practical  question  of  whether  the  workload 
is  appropriate  for  the  primary  task  under  consideration.  Subsequent  experience 
of  using  the  scale  has  proved  this  decision  to  have  beon  correct  (see  later). 

The  need  for  concise  descriptors  in  an  adjectival  rating  scale  was  high¬ 
lighted  during  the  evaluation  of  the  absolute  scale  in  flight.  It  soon  beeame 
apparent  that  the  introduction  of  the  additional  factors  of  arousal  and  tatigue 
complicated  the  scale  unduly.  After  some  development  in  flight  and  further 
discussions  with  pilots  the  present  descriptors  were  introduced  trig  i  >  «■  these 
were  readily  accepted  and  in  a  short  time  considered  by  Bedford  test  pilots  to  be 
quite  adequate  for  the  purpose  of  rating  workload. 

2.2  Description  of  the  scale  (Pig  1) 

The  pilot  starts  his  deeisicp-making  process  at  the  Imtsm  left  cornet  as 
the  decision  tree,  which  consists  of  tncea  questions  requiring  yes  or  n?  answer#, 
in  order  to  proceed  to  the  descriptions  of  different  levels  of  workload.  Thu 
descriptors  are  of  increasing  levels  of  workload  associated  with  ratines  of  1  to 
10.  Half  ratings  are  allowed  thereby  increasing  the  sensitivity  of  the  scale, 
this  became  particularly  desirable  at  the  lower  workload  levels,  originally  naif 
ratings  between  the  ‘decision*  groups  were  not  sought  but  as  many  pilots  seemed 
to  find  it  difficult  to  deeide  between  'yes'  and  ‘no*  for  “Mas  workload  satisfac¬ 
tory  without  reduction?“  a  rating  of  3 Vi  became  acceptable. 

H  is  most  important  that  the  flight  task  to  be  sated  should  be  well 
defined  and/or  the  period  of  time  over  which  the  assessment  is  to  fee  wade  stated 
with  reasonable  precision.  The  workload  being  assessed  is  that  involved  in  the 
execution  of  the  primary  task,’  any  additional  tasks  -  such  as  awnitering  other 
grew  members  »  must  be  included  as  part  of  the  pilot's  spare  capacity. 


Workload  Description 


7 


Decision 

Tree 


Yea 


Yea 


Ktl  tori  hill 
toleroble  for 
the  tut? 


Yea 


Vti  H  possible 
to  toe filete 
t*»  tHt? 


Rating 


— 

Workload  Instgnlflcent. 

1 

Workload  low. 

2 

— 

enough  spore  copoclty  for  oil  dosiroblo  oddttlonol 
tosks. 

3 

No 

— • 

Insuffieiont  spore  copoclty  for  oosy  otteatton 
to  iddithnef  tosks. 

4 

sot iif act  or/ 

Deduced  spore  copoc/ty.  Addittonol  tosks  coimot 

5 

without 

be  given  the  desired  mount  of  otteatton. 

reduction? 

Little  spore  copoclty:  level  ef  effort  allows 

6 

little  attention  to  oddtttoeel  tosks 

No 

r— 

Very  little  spore  copoclty,  but  as  In  Unmet  of 
of fort  in  the  prlmory  took  not  In  guest  Ion. 

7 

Vtry  high  workload  wit*  alao it  no  spsre  copocity. 
Difficulty  In  ootntolntng  level  of  effort. 

8 

Cotromely  high  wort  lord.  Ito  spore  copoclty.  Serious 
doubts  os  to  obillty  ta  molntoin  level  ef  otfert. 

IT 

|No 

To sk  obondoned.  Ulot  unoble  ta  opply  sufficient  effort. 

14 

Flo  1  Pilot  wofttiood  ratine  *c*i«  (tor  •  iptcUUd  piloting  task) 

3  PRACTICAL  EXPERIENCE  WITH  TUB  ’BEDFORD*  SCALE 

The  find*  version  ef  the  rating  scale  was  first  used  to  assess  pilot 
wrkisad  during  the  MS  nattier  ski  -jutap  take-off  trial*'.  This  trial  assessed 
the  advantages  of  using  an  inclined  rasp  to  improve  the  take-eff  performance  of 
ship-borne  Harrier  Vt'Qt.  ceabat  aircraft.  The  aircraft  is  accelerated  on  to  the 
rasp  ftota  a  short  run  •  typically  SO- 1 00  tact  tea  -  with  necsles  rotated  rearwards. 
At  the  top  of  the  trap*  and  on  the  point  of  becoming  airborne,  the  neeaies  ate 
rotated  downwards  to  a  pre-set  angle.  As  conventional  flying  speed  is  approached 
the  nestles  are  gradually  rotated  to  the  aft  position  again. 

The  Cooper-Harper  scale  was  used  to  assess  handling  qualities  and  the 
Uedford  scale  tc  estimate  workload  levels,  the  latter  being  augmented  by  record¬ 
ing  the  pilot's  heart  rate. 


vs  ♦easr 


8 


Eleven  pilots  rated  their  workload  and  had  their  heart  rates  recorded 
during  ramp  take-offs;  the  ramp  angle  was  increased  in  steps  from  6°  to  15°  over 
the  period  of  the  trial.  Workload  ratings  and  heart  rates  showed  good  agreement; 
and  both  ratings  and  heart  rates  confirmed  that  workload  levels  were  not 
increased  for  greater  ramp  angles  nor  for  night  take-offs.  These  workload 
indicators  also  demonstrated  that  levels  of  workload  are  higher  during  the  more 
conventional  short  take-offs  (for  this  aircraft)  from  a  runway. 

The  workload  rating  scale  was  used  extensively  during  a  trial  to  evaluate 
Economic  Category  3  approach  and  landings12.  Pilots'  heart  rates  were  again 
recorded  to  complement  their  ratings  of  workload.  The  technique  involved  an 
autopilot  coupled  approach  to  a  decision  height  of  50  ft  for  the  HS  748  and  60  ft 
for  the  BAC  1-11  aircraft  at  which  height  the  autopilot  was  disconnected  for  a 
manual  landing  if  the  runway  lights  were  soon.  If  the  lights  were  net  seen  by 
decision  height  a  go-around  was  made. 

In  addition  to  rating  the  final  approach  with  autopilot,  and  th6  manual 
landing,  ratings  were  given  for  the  very  short  term  workload  associated  with 
making  the  decision. 

In  late  1992  the  Bedford  scale  and  pilots'  heart  rate  responses  were  used 
to  assess  workload  in  flight  during  crew  complement  certification  @f  the 
BAe  146;>,  Post  flight  questionnaires  complemented  the  in-flight  data  Obtained 
from  the  three  teams  (of  tee  pilots)  who  each  flew  three  days  of  intensive  flight 
schedules  around  a  circuit  of  three  high  intensity  airports  Ibondon  ■  Heathrow, 
Paris  -  Charles  de  ttaulle,  and  Amsterdam  -  Sebiphei). 

Workload  ratings  were  obtained  from  both  pilots  and  fi.q*.  an  expet  tensed 
flight  observer,  en  verbal  request  and  light  signal  from  the  exercise  «wt»iUt, 
by  means  of  small  keyboards  fitted  to  the  control  column  ana  to  the  observer's 
clip  board,  {ratings,  which  were  plotted  automatically  tuw#  the  heart  rate  plats 
at  the  time  of  the  request,  were  requested  according  to  a  predetermined  plan; 
requests  were  mare  frequent  during  high  workload  phases  of  flight  seek  as  the 
take-off  and  initial  climb,  the  approach  and  landing,  and  when  simulated  in¬ 
flight  failures  and  emergencies  occurred. 

Pilots  were  instructed  in  the  use  of  the  ratine  seal©  before  the  exercise 
started  and,  in  particular,  were  asked  to  consider  their  workload  for  the 
previous  30  a.  All  six  pilots  and  m@st  of  the  flight  observers  found  the  scale 
easy  to  use;  there  was  no  evidence  that  giving  ratings  in  flight  intruded  sntc- 
the  piloting  task  and  only  rarely  was  a  rating  delayed  by  the  flight  demands . 

Half  eatings  were  not  used  during  this  trial  and  though  the  sensitivity  of  the 


9 


scale  was  consequently  reduced  it  did  not  appear  to  influence  its  value  for 
certification  purposes.  In  fact,  the  ratings  obtained  during  the  trial  were 
considered,  overall,  tc  be  of  considerable  value;  and  there  was  also  surprisingly 
good  agreement  between  ratings  given  by  pilots  and  those  given  by  most  of  the 
flight  observers. 

There  was  a  reasonably  good  relationship  between  pilot's  ratings  and  their 
heart  rate  responses;  disagreements  seemed  to  be  due  mostly  to  the  failure  of  the 
pilot  to  rate  the  entire  period  under  review. 

Lidderdale14  used  the  Bedford  scale  and  heart  rate  recordings  to  assess 
crew  workload  (pilot  and  navigator)  during  the  evaluation  of  low-ievel  high-speed 
flight  in  a  supersonic  tactical  fighter  aircraft.  He  reported  that  “...  the 
aircrew  understood  the  scale  readily  and  whilst  it  was  sufficiently  comprehensive 
to  cover  all  circumstances  it  was  easy  to  remember  and  small  enough  to  be  carried 
on  the  flying  suit  knee-pad**,  At  pre-planned  times  the  navigator,  having 
recorded  his  own  rating  of  workload,  would  request  a  rating  from  the  pilot.  Both 
pilots  and  navigators  reported  little  difficulty  in  giving  ratings  in  flight.  A 
post-flight  assessment  technique,  based  on  the  Analytical  Hierarchy  Process'1'-’, 
was  used  to  analyse  paired  comparison*.  Udderdale,  in  reporting  a  high 
correlation  between  in-flight  ratings  and  the  post-flight  assessments,  observed 
that  the  former  technique  was  easier  to  use  and  mete  practical  for  use  in  an 
operational  trial  (personal  eeswrnieatien) . 

The  Bedford  scale  was  used  with  success  by  Muir  and  fUwellu  to  assess 
workload  in  atwy  heliceprer  pilots  engaged  in  various  flight  tasks;  and  by 
Barnes  daring  an  investigation  into  the  levels  of  workload  and  operating 
ce.nditisns  esper Sensed  fey  helicopter  pilots  involved  in  North  Sea  ail  platform 
flights. 

Be-ssee  and  grieve"*  usee  the  Bedford  r-aie  together  with  heart  rate 
responses  to  eetspate  the  levels  ef  pilot  workload  generated  fey  the  advanced 
teshn-slagy  Seeing  Tfe?  with  these  generated  fey  the  earlier  Seeing  tjf.  This 
Britannia  Airways  study  was  carried  ©«t  during  routine  passenger  flights  In 
fvrepe  during  which  line-pilots  found  its  difficulty  in  giving  ratings  at  the  end 
of  the  particular  flight  phase  at  s«fe-phase  of  interest.  An  experienced  flight 
observer,  using  the  Bedford  scale,  als©  rated  the  different  flight  tasks.  Five 
pi  lets  participated  in  the  study,  three  ware  tashitered  bn  the  BIS’)  and  then, 
after  converting  to  type,  ©n  the  DT6?,  two  pi  lets  were  monitored  on  both  aircraft 
as  they  alternated  every  sis  tkentfes  between  UTST  and  JWf.  Both  work  lead  ratings 
and  heart  rate  responses  showed  that  tire  handling  pilot's  levels  of  workload  are 


r* 


10 


lower  on  the  767  than  on  the  737.  These  measures  have  also  distinguished  quite 
clearly  between  the  different  levels  of  workload  associated  with  flying  the  767 
in  different  modes:  hand-flying  with  raw  information,  hand-flying  with  flight 
director  integrated  with  the  flight  management  system  (FMS),  and  with  autopilot 
and  autothrottle. 

An  extension  to  this  study  is  presently  under  way  in  which  workload  levels 
generated  by  flight  failures,  emergencies,  and  abnormal  operating  conditions  in 
the  767  ore  being  assessed  in  a  flight  simulator.  Both  ratings  and  heart  rates 
for  normal  flight  in  the  aircraft  and  simulator  wore  used  successfully  to 
demonstrate  the  value  of  the  simulator  for  this  type  of  investigation14. 

Practical  experience  of  using  the  workload  rating  scale  in  several  flight 
trials  showed  it  to  be  markedly  better  than  previous  methods  of  obtaining  pilot 
opinion,  and  though  it  lack'd  some  sensitivity  -  especially  at  the  lower  levels  - 
it  was  becoming  well  accepted  by  pilots  and  by  research  scientists  alike.  Never¬ 
theless,  in  1983  it  was  decided  to  carry  out  a  aeries  of  flights  to  demonstrate 
the  ability  of  tho  rating  scale  and  of  heart  rate  tesponses  to  distinguish 
between  four  short  flight  tasks  having,  theoretically,  three  different  levels  ,-f 
difficulty.  A  HS1JS  twin  ‘business-jet*  was  used  ter  tn«  trial  in  which  U 
experienced  pilots  flew  a  total  of  IS  sequences.  Rash  sequence  consisted  <•(  ■ 

iu)  A  360*  turn  in  2  win  at  constant,  altitude,  5AS.  and  *ste  o?  turn. 

(bl  A  360*  turn  in  2  min  with  a  simultaneous  ef  2000  ft  in  altitude 

it  a  constant  IAS  and  rate  of  turn. 

ic)  A  360*  turn  in  2  sain  with  a  simultaneous  29-JO  ft  altitude  ms.# 
followed  by  a  reverse  360*  turn  in  2  mm  with  a  «i*ralt«ne»*i»  gam 
2000  ft  at  a  constant  IAS  and  rate  of  turn. 

<dl  -A  360*  turn  in  2  wm  with  a  simultaneous  altitude  le.ss  ---f.  Jda-i?  ft  ar.-i 
speed  reduction  of  100  fen. 

taeh  sequence  was  flown  at  a  safe  height  i-  clear  airspace,  M  first 
performance  was  monitored  toy  the  Atw, -handling  pi  list  feet  as  it  seen  toe-sane 
that  each  pilot  was  determined  to  perform  well  in  the  presence  «f  a  S-r-i  league 
this  was  discontinued.  St  was  the  intention  to  vary  the  -order  in  whies  the 
were  flown  -  tout,  operationally,  it  was  much  mere  convenient  to  fly  each  sequence 
in  the- order  1*6  with  the  first  task  being  repeated  »t  the  end  «f  the  sequence. 
Heart  rates  were  recorded  throughout  each  sequence  and  wcrkltsad  ratings  were 
requested  after  each  flight  task.  Three  pilots  flew  the  sequences  twice.  These 
pilots  not  current  on  the  126  were  given  at.  least  30  rain  familiarisation  before 


being  asked  to  rate  the  tasks;  similarly,  those  pilots  unfamiliar  with  the  rating 
scale  were  given  a  full  briefing  beforehand. 

Results  were  highly  encouraging  and  demonstrated  that  for  10  of  the 
12  pilots  both  workload  ratings  and  heart  rate  responses  were  able  reliably  to 
distinguish  between  the  different  tasks.  In  addition  there  was  a  very  good 
agreement  between  ratings  and  heart  rates  for  six  of  the  10  pilots  and  reasonably 
good  agreement  for  the  other  four. 

The  mean  ratings  and  range  for  each  flight  task  were  as  follows: 


Task 

Mean  Rating 

Range 

1 

4.8 

3-6 

2 

6.1 

4-7 

3 

7.1 

5-8 

4 

7.0 

4-8 

1 

5.0 

3-6 

On  theoretical  grounds  it  was  questionable  whether  task  *  1,  which  lasted 
twice  as  long  as  the  ether  three,  would  be  rated  lower  than  task  -  4.  in  the 
event,  task  -  3  wat  rated  higher  than  task  *  4  on  three  oeeasiens.  lower  on  five, 
and  the  same  on  seven  occasions.  Although  pilots  were  asked  to  rate  the  entire 
task,  half  of  the  pilots  gave  two  ratings  ter  taal  -  3. 

Overall,  there  was  no  evidence  of  learning  and  reduced  workload  between 
task  -  l  flown  at  the  start  and  at  the  end  of  the  sequence.  Similarly,  three 
pilots  whs  flew  the  se<quenee  en  two  separate  sorties  did  net  shew  any  evidence  m 
workload  redact  left  with  increasing  fasti  liar  tty. 

4  DISCUS5XOM 

The  Bedford  scale  has  new  been  in  use  fat  stare  than  10  years  and  though 
designed  primarily  ip r  use  by  test  pi lets  has  been  used  successfully  by  military 
Bilcta.  civil  helicopter  pilots,  and  airline  pilots.  The  idea  »4  “spare 
capacity*  appeals  is  wist  pilcts  wne  report  that  it  helps  the#  to  arrive  at  an 
appropriate  rating  with  relative  ease,  this  is  particularly  valuable  when  first 
using  the  scale,  in  practice,  pilots  appear  to  become  fasti  liar  with  the  seat  a 
remarkably  quickly  »  js*st  pilots  then  seers  t&  think  only  in  terras  of  sunsets 
without  reference  t*  the  actual  decision  tree. 

The  advantage  of  being  able  to  use  a  rating  scale  during,  or  shortly  after, 
a  demanding  flight  task  hat  been  demonstrated  many  tines  during  the  1C  years  » 
especially  during  long  flight  sectors  requiring  many  ratings,  Best  flight 
ratings  -  even  with  the  assistance  off  video  recordings  *  *s»*t  be  loss  reliable. 


•  .  V 

I'TO'Ttqa  - 

-  •  1.^0-t‘VX  ■ 

•  ••■•VIH' ■> 


-j*  wf'i* 


12 


The  authors  are  unaware  of  any  other  workload  rating  scale  that  has  been 
used  as  extensively  for  assessing  workload  in  the  'real  world';  nevertheless, 
there  are  a  number  of  possible  shortcomings. 

Several  authors  have  underlined  the  importance  of  sensitivity  and  diagnos- 
ticity,  others  have  stressed  the  multidimensional  nature  of  workload,  and  rating 
scales  having  varying  degrees  of  sophistication  have  been  designed  with  these 
issues  in  mind20-22.  A  subjective  workload  assessment  technique  (SWAT)  developed 
by  Reid  and  his  co-workers23,24  considers  workload  in  three  dimensional  terms  - 
time  load,  mental  effort,  and  psychological  stress.  Hart  and  her  colleagues25,26 
have  developed  and  refined  a  multi-dimensional  rating  scale  consisting  finally  of 
six  subscales,  namely:  physical  demands,  mental  demands,  time  pressure,  own 
performance,  effort  and  frustration. 

Certainly,  in  many  situations  it  may  be  quite  important  to  be  able  to 
analyse  the  reasons  why  workload  has  changed.  The  Bedford  scale  does  not  have 
the  power  of  diagnosis  but  in  practice,  on  the  rare  occasions  when  diagnostic 
information  is  required,  post-flight  discussions  with  the  pilots  -  especially 
when  their  beat-to-beat  heart  rate  plots  are  used  as  an  aide  memoire  -  are 
proving  to  be  of  considerable  value.  And,  in  most  cases,  assessment  of  overall 
or  global  workload  is  all  that  is  required  -  for  example,  during  workload 
assessments  for  the  purpose  of  crew  complement  certification. 

The  lack  of  sensitivity  at  the  lower  end  of  the  scale  was  at  first  thought 
to  be  a  disadvantage,  but  experience  now  suggests  that  it  is  unrealistic  to 
strive  for  a  high  level  of  sensitivity.  This  is  particularly  so  in  view  of  the 
variations  in  subjective  evaluations  between  pilots  and,  occasionally,  within  the 
same  pilot  from  time  to  time.  In  addition,  the  cost  effectiveness  of  modifying 
systems  or  procedures  to  correct  for  small  differences' in  satisfactory  workload 
is  questionable. 

The  nonlinear  nature  of  the  scale,  although  making  it  difficult  to  carry 
out  statistical  treatments  on  data,  does  not  seem  to  cause  any  problem  in 
practice. 

The  decision  to  develop  a  scale  giving  relative  -  rather  than  absolute  »- 
values  of  workload  has  not  appeared  to  pose  any  problem  from  the  practical  point 
of  view,  And  it  may  be  questionable  whether  absolute  rating  scales  are  really 
recessary.  hidderdale14,  in  comparing  in-flight  with  post-flight  workload 
assessments,  wrote:  "It  is  possible  that  all  assessments  of  workload  are  made 
from  a  baseline  of  comparison  with  other  elements  in  tho  flight  and,  i£  this  is 
the  eaoe,  all  rating  methods  may  be  relative," 


tk  »ae:v 


13 


Examination  of  several  hundred  workload  ratings  given  during  a  variety  of 
flight  tasks  or  flight  phases  together  with  attempts  to  correlate  ratings  with 
heart  rate  responses  have  resulted  in  three  important  findings.  Firstly,  as 
suggested  by  Ellis6  in  1979,  it  was  not  possible  to  compare  ratings  for  different 
flight  tasks;  for  instance,  ratings  given  during  the  take-off  could  not  be 
related  to  those  given  during  the  approach  and  landing. 

Secondly,  although  reasonably  consistent,  ratings  were  highly  individual 
for  each  pilot;  unless  some  normalisation  procedure  was  applied  to  the  data  -  if 
sufficient  data  were  available  -  each  subject  pilot  had  to  be  considered  as  his 
own  control;  a  similar  but  more  marked  idiosyncrancy  has  already  been  identified 
for  heart  rate  responses25. 

Finally,  a  good  correlation  between  workload  ratings  and  heart  rate 
responses  for  the  same  flight  ta3k  wos  apparent  in  about  80%  of  pilots  but  one  in 
five  did  not  show  any  agteement.  The  reasot  for  this  lack  of  agreement  has  not 
always  been  identified,  sometimes  it  was  because  a  pilot  has  not  integrated  his 
workload  over  the  entire  period  of  interest,  but  there  is  some  evidence  that  it 
may  be  related  t  th»  nature  of  that  particular  individual's  heart  rate  response. 

5  CONCLUSIONS 

The  Bedford  workload  rating  scale  was  designed  for  uao  in  the  'real  world' 
of  practical  flight  testing  some  10  years  ago;  and  pilots  have,  without 
exception,  found  it  easy  to  use  in  flight.  During  the  past  decade,  despite  the 
shortcomings  referred  to  above,  Its  value  has  been  demonstrated  in  several  flight 
trials  -  especially  when  ratings  have  been  augmented  by  recording  the  pilots 
heart  rate. 

However,  unlike  the  studies  carried  out  on  some  other  rating 
scales-''*2 '»  the  Bedford  scale  has  not  been  subjected  to  a  critical  ©valu¬ 
ation  in  controlled  laboratory  experiments.  Nevertheless,  the  authors  believe 
the  scale  in  its  present  form  is  quite  suitable  for  assessing  workload  in  «»st 
practical  situations. 


REFERENCES 


Author 
R.P.  Harper 


J.D.  McDonnell 


G.E.  Cooper 
R.P.  Harper 


S.J.  Gerathewohl 


G.A.  ElUa 
A.H.  Roacoe 

3. A.  Ellis 


D.  McDonnell 


K.A.  Mess 


Title,  etc 

Pilot  evaluation  of  handling  qualities. 

Ninth  Annual  Human  Factors  Society  Meeting 
(1965) 

An  application  of  measurement  methods  to  improve 
the  quantitative  nature  of  pilot  rating  scales. 

IEEE  Transactions  on  Man-Machine  Systems 
10,  81-92  (1969) 

The  use  of  pilot  rating  in  the  evaluation  of 
aircraft  handling  qualities. 

NASA  Technical  Note  TN-D-5153  Washington  DC 
(1969) 

Definition  and  measurement  of  perceptual  and  mental 
workload  in  aircrew  and  operators  of  Air  Force 
weapon  systems. 

In  AGARD  CP-181  Hignar  mental  functioning  in 
operational  systems  AGARD,  Paris  (1916) 

The  airline  pilot's  view  of  flight  deck  workload: 

A  preliminary  study  using  a  questionnaire. 

RAE  Technical  Memorandum  PS!B>  465  11982) 

Subjective  assessment,  piiot  opinion  measures. 

In:  AGARDograph  No  233  Roseoe  A  tl  (Ed) 

Assessing  Filet  Workload,  AGARD,  Paris  (1938) 

Pilot  rating  techniques  for  the  estimation  and 
•valuation  of  handling  qualities. 

APFDh  -  TR-68-3S  Wright -Faueraeh  AFH,  Ohio  (1H8) 

Non  adjectival  rating  scales  in  human  response 
experiments. 

Neman  factors,  IS,  235-280  (1933) 


D.A  Spykor 


Development  of  techniques  ter  measuring  pilot 
workload. 

NASA  CR-1688,  Washington  DC  (1931) 


15 


REFERENCES  (continued) 

No.  Author  Title,  etc 

10  J.M.  Rolf  Evaluating  measures  of  workload  in  a  flight  simulator. 

et  aJ  In  AGARD  CP-146  Simulation  and  Study  of  high  workload 

operations . 

AGARD,  Paris  (1974) 

11  A.H.  Roscoe  Assessing  pilot  workload  in  flight. 

In:  Conference  Proceedings  No. 373. 

Flight  Test  Techniques,  AGARD,  Paris  (1984) 

12  A.H.  Roscoe  Pilot  workload  and  economic  Category  3  landings. 

In:  Conference  Pre-prints  Aerospace  Medical 
Association  Annual  Scientific  Meeting, 

Washington  DC  (1980) 

15  w.A.  Wai  .right  Flight  test  evaluation  of  crew  workload. 

In  AGARDograph  No  282  Roscoe  A  H  (Ed) 

The  practical  assessment  of  pilot  workload, 

AGARD.  Pans  (1987) 

14  I.G.  hidderdale  Measurement  of  aircrew  workload  during  low-level 

flight. 

In  A>  ARDograpn  No  282  Roscoe  A  tl  <Ed) 

The  practical  assessment  of  pilot  workload, 

AGARD,  Pa-ie  (1981) 

15  Y.l.  Saaty  Thi  analytical  hierarchy  process. 

McGraw-Hill  (1980) 

18  H.C.  Muir  The  assessment  of  workload  in  helicopters. 

R.  Klwoli  In:  Proceedings  of  RAE/ASARD/C1T  Symposium  - 

The  practical  assortment  of  pilot  workload, 

Cranfteld  U9B6) 

17  R.  Hames  Flight  dock  environment  and  working  in  North  Sea 

R.C.  Graeber  helicopter  operations. 

in  Conference  Abstracts  Aerospace  Medical 
Association  Annual  Scientific  Meeting, 

Washington  OC  (1986) 

18  A.H.  Roscoe  The  impact  of  n*w  technology  on  pilot  wotk’oad. 

a.S.  Grieve  SAB  Technical  Paper  C6i773  (1966) 


tn  'ir-ff'.  ■* 


16 


No.  Author 

19  AH.  Roscoe 
B.S.  Grieve 


20.  J.H.  Skipper 
C.A.  Rieger 
W.W.  Wierwille 

21  G.B.  Reid 


et  ai 


22 


S.G.  Hart 
L.E.  Staveland 


23  G.B.  Reid 

C.A.  Shinglodockor 
F.T.  Eggemoir 

24  F.V.  Schick 
R.L.  Harm 


25  M.R.  Bortolussi 
B.H.  Kantowits 
S.G.  Hart 


26  W.W.  Wiervilla 
J.H,  Skipper 
C.A.  Rieger 


27  P .  Tseng 

W.W.  Johnson 


REFERENCES  (concluded) 

Title,  etc 

Assessment  of  pilot  workload  during  Boeing  767 
normal  and  abnormal  operating  conditions. 

SAE  Technical  Paper  881382  (1988) 

Evaluation  of  decision-tree  rating  scales  for 
mental  workload  estimation. 

Ergonomics,  29,  585-599  (1986) 

Development  of  multi-dimensional  subjective 
measures  of  workload. 

Proceedings  of  the  International  Conference  on 
Cybenetics  and  Society,  403-406,  New  York  (1981) 

Development  of  NASA-TLX  (NASA-Ta3k  Load  Index) : 
Results  of  empirical  and  theoretical  research 
In:  Human  Mental  Workload,  Eds  P.A.  Hancock  and 
N.  Meshkati,  North  Holland  Press,  Amsterdam 
(1988) 

Application  of  conjoint  measurement  to  workload 
scale  development. 

In:  Proceedings  of  the  Human  Factors  Society  - 
25th  Annual  Meeting  (1981) 

The  use  of  subjective  workload  assessment  techniques 
in  a  complex  flight  task. 

In  ASARDograph  No  282  The  practical  assessment  of 
pilot  workload.  Ed  A.H.  Roscoe,  AGARJ8,  Paris  (19*17) 

Measuring  pilot  workload  in  a  motion  base  trainer. 

A  comparison  of  four  techniques. 

Proceedings  of  tho  Third  Biannual  Symposium  on 
Aviation  Psychology,  Columbus,  Ohio  (1988) 

Decision-tree  rating  scales  for  workload  estimation: 
Theme  and  variations. 

Proceedings  of  20th  Annual  Conference  on  Manual 
Control,  Sunnyvale  (1964) 

Cognitive  demands  in  automation  Aviat. 

Space  Environ  Ned,  CO,  130-5  (1969) 


REPORT  DOCUMENTATION  PAGE 

Overall  security  classification  of  this  page 

_  UNLIMITED 

A',  tar  possible  this  page  should  contain  only  unclassified  information.  If  it  is  necessary  to  enter  classified  information,  the  box 
above  must  be  marked  to  indicate  the  classification,  e.g.  Restricted.  Confidential  or  Secret. 


!  DR  1C  Reference 

2.  Originator’s  Reference 

3.  Agency 

4.  Report  Security  Classification/Marking 

<io  be  added  by  DRIC) 

RAE  TR  90019 

Reference 

UNLIMITED 

S  DRIf  Code  for  Originator 
7672000H 


j  *?a.  Sponsoring  Agency's  Code 


6.  Originator  (Corporate  Author)  Name  and  Location 
Royal  Aerospace  Establishment,  Bedford,  UK 


6a.  Sponsoring  Agency  (Contract  Authority)  Name  and  Location 


A  subjective  rating  scale  for  assessing  pilot  workload  in  flight: 
a  decade  of  practical  use 


7a.  (For  Translations)  Title  in  Foreign  Language 


7b.  i  For  Conference  Papers)  Title,  Place  and  Date  of  Conference 


'  8.  V'thoi  1 .  Surname,  Initials 

Roscoc,  A.H. 

9a.  A jthcfr  2 

Ellis,  G. A. 

9b.  Authors  3, 4  .... 

10.  Date 
March 
1990 

Pages 

16 

Refs. 

27 

I !  Contract  Number 

12.  Period 

13.  Project 

14.  Other  Reference  Nos. 
Flight  Management  6 

1'  Distiibmion  statement 

(a)  Controlled  by  - 

(b)  Special  limitations  (if  any)  - 

If  it  is  intended  that  a  copy  of  this  document  shall  be  released  overseas  refer  to  RAE  Leaflet  No.3  to  Supplement  6  of 
MOD  Manual  4. 


I f>.  Descriptors  (Keywords)  (Descriptors  marked  •  are  selected  from  TEST) 

Workload.  Flight.  Pilot.  Rating  scale. 


1 7  Abstract 

Despite  the  many  techniques  developed  for  evaluating  pilot  workload  in 
flight  subjective  assessment  by  experienced  pilots  is  still  the  most  reliable 
method  by  far.  This  report  describes  the  design  and  development  -  with  the  help 
of  practising  test  pilots  -  of  a  ten-point  rating  acale.  The  scale  uses  a 
decision  tree  similar  to  that  used  by  the  Cooper-Harper  Handling  Qualities  scale, 
and  is  based  on  the  concept  of  spare  capacity.  Examples  are  given  of  its  use  by 
a  large  number  of  pilots  in  various  flight  trials  and  workload  studies. 


Best  Available  Copy 


K  A I  r..tm  A  M  l  (revised  October  1980, 


